Biostatistics For Dummies (Monika Wahi John Pezzullo)

Multiple regression is formally known as the ordinary multiple linear regression model.

What a mouthful! Here’s what the terms mean:

Ordinary: The outcome variable is a continuous numerical variable whose random fluctuations

are normally distributed (see Chapter 24 for more about normal distributions).

Multiple: The model has more than two predictor variables.

Linear: Each predictor variable is multiplied by a parameter, and these products are added

together to estimate the predicted value of the outcome variable. You can also have one more

parameter thrown in that isn’t multiplied by anything — it’s called the constant term or the

Intercept. The following are examples of linear functions used in regression:

(This is the straight-line model from Chapter 16, where X is the predictor

variable, Y is the outcome, and a and b are parameters.)

(In this multiple regression model, variables can be squared or

cubed. But as long as they’re multiplied by a coefficient — which is a slope from the model

— and the products are added together, the function is still considered linear in the

parameters.)

(This multiple regression model is special because of the XZ term,

which can be written as

, and is called an interaction. It is where you multiple two

predictors together to create a new interaction term in the model.)

In textbooks and published articles, you may see regression models written in various ways:

A collection of predictor variables may be designated by a subscripted variable and the

corresponding coefficients by another subscripted variable, like this:

In practical research work, the variables are often given meaningful names, like Age, Gender,

Height, Weight, Glucose, and so on.

Linear models may be represented in a shorthand notation that shows only the variables, and not

the parameters, like this: Y = X + Z + X * Z instead of Y = a + bX + cZ + dX * Z or Y = 0 + X +

Z + X * Z to specify that the model has no intercept. And sometimes you’ll see a “~” instead of the

“=”. If you do, read the “~” as “is a function of,” or “is predicted by.”

Being aware of how the calculations work

Fitting a linear multiple regression model essentially involves creating a set of simultaneous equations,

one for each parameter in the model. The equations involve the parameters from the model and the

sums of various products of the dependent and independent variables. This is also true of the

simultaneous equations for the straight-line regression in Chapter 16, which involve estimating the

slope and intercept of the straight line and the sums of

, and XY. Your statistical software

solves these simultaneous equations to obtain the parameter values, just as is done in straight-line